Automated detection of anomalous user activity associated with specific items in an electronic catalog

ABSTRACT

An anomaly detection engine monitors network traffic to detect orders placed by users from an electronic catalog of items, aggregates data about the detected orders by time period, and analyzes the aggregated data to detect anomalies in activity levels associated with specific items in the catalog. To detect whether an anomaly exists in the activity data associated with a given item, a forecasting algorithm, such as an exponential smoothing algorithm, is used to generate an expected order volume for a current time period, and the expected order volume is compared to an actual order volume. Other criteria may also be taken into consideration. If an anomaly is detected, such as a sharp increase in the item&#39;s order volume, the anomaly detection engine generates an alert message to notify a catalog administrator, who may then determine whether the anomaly is attributable to an erroneous item description in the catalog.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to computer-implemented processes forefficiently detecting anomalous user activity associated with specificitems, such as items in an electronic catalog. The detected anomaliesmay, for example, be attributable to, and may be used to correct, errorsin an electronic catalog.

2. Description of the Related Art

It has become common for businesses to set up web sites, and other typesof interactive computer systems, to automate the process of acceptingorders from users. Information about the items that can be ordered viasuch a system is typically disseminated to users via a browsableelectronic catalog. While browsing the electronic catalog, users cantypically select one or more items to purchase, rent, or otherwiseacquire, and then place an order for these items. The ordered items may,for example, be shipped to the user from a distribution center, madeavailable for local pick-up, or transmitted to the user electronically.

One problem with this type of system is that a large number of users canrely on, or take advantage of, a typographical or other error in theelectronic catalog before the error is detected and corrected byauthorized personnel. As a result, a single error, such as an error inthe price of an item, can result in a significant loss of revenue to anonline merchant. One potential solution to this problem is to set up acomputer system that analyzes each order to evaluate whether itrepresents a significant departure from current trends. Due to thecomputational burden associated with this approach, however, it is notwell suited for systems that process large numbers of orders (e.g.,hundreds or thousands of orders per minute) placed from a catalog thatincludes a large number of items (e.g., millions of items).

SUMMARY OF THE INVENTION

The present invention comprises a system that detects anomalous useractivity associated with specific items in an electronic catalog. Thesystem may, for example, be implemented using a computer system, such ageneral-purpose computer, that passively monitors orders placed by usersof the electronic catalog. The system is suitable for use in anelectronic catalog system that, for example, receives thousands oforders per minute from a catalog that includes millions of items.

In one embodiment, the system includes a data repository that storesaggregated data about orders placed from an electronic catalog. Theaggregated data may be arranged by time period, where each time periodmay, for example, have a duration of one hour. To analyze the aggregateddata associated with a current time period (e.g., the last hour), ananalyzer selects, from a set of items ordered during the current timeperiod, a subset of items for which to conduct an anomaly analysis. Thesubset may, for example, be selected based on the quantity of each itemordered during the current time period and/or other criteria. Bylimiting the analysis to a selected subset of items, the analyzercontrols the processing load associated with the anomaly detectionprocess.

For each item in the subset, the analyzer uses order volume data fromprior time periods to generate a forecasted or expected order volume forthe current time period. An exponential smoothing algorithm may be usedfor this purpose. In one embodiment, the order volume for each item isspecified in terms of the total quantity of the item ordered in therelevant time period, although other metrics reflective of the demandfor the item, such as total number of distinct users that order theitem, or total number of orders received for one or more units of theitem, may additionally or alternatively be used. To determine whether anitem's order activity or demand during the current time period isanomalous, the actual order volume associated with the item is comparedto the item's forecasted order volume. Other criteria, such as thenumber of distinct users that ordered the item during the current timeperiod, may also be taken into consideration.

If the analyzer determines that an anomaly exists in the order activitydata for a given item, an alert message is generated and sent to anassociated catalog administrator, such as an administrator responsiblefor a corresponding product category. The alert message may include ahyperlink to an associated catalog page to enable the administrator toefficiently evaluate whether the detected anomaly is attributable to anerroneous catalog description of the item. The alert message may alsoprovide an option (e.g., a set of buttons or links) for the messagerecipient to provide feedback on whether the anomaly was properlydetected. In embodiments that provide such a feedback option, thefeedback may be used, on an item-by-item or other basis, to adaptivelyadjust the sensitivity of an anomaly detection algorithm used by theanalyzer.

The invention may also be used where some or all of the orders areplaced without the use of an electronic catalog. For example, theinvention is applicable to systems that accept orders from recipients ofa paper catalog that describes items that can be purchased.

One aspect of the invention is thus a system for detecting anomaloususer activity associated with items in a catalog. The system comprises adata repository that stores aggregated data descriptive of orders placedby users from a catalog of items, with the aggregated data arranged bytime period. A forecasting module analyzes item demand levels in priortime periods on an item-by-item basis, as indicated by the aggregateddata, to predict demand levels for respective items in a current timeperiod. The item demand levels may, for example, be measured andpredicted in terms of total quantity of item ordered per time period. Ananomaly detection module detects anomalies associated with specificitems in the catalog, at least in part, by comparing the demand levelspredicted by the forecasting module to corresponding observed demandlevels. A reporting module generates alert messages to notify catalogadministrators of items for which anomalies are detected by the anomalydetection module.

Neither this summary nor the following detailed description purports todefine the invention. The invention is defined by the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an electronic catalog system that includes an anomalydetection engine according to one embodiment of the invention.

FIG. 2 illustrates a graph that depicts an anomaly in the order activityassociated with a particular item in a catalog.

FIG. 3 illustrates an example of an email message that may be sent tonotify catalog administrators of detected anomalies.

FIG. 4 illustrates a sequence of steps performed by the anomalydetection engine to analyze order data collected over a period of time.

FIG. 5 illustrates one example of how relevance feedback may be takeninto consideration to evaluate potential anomalies.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

FIG. 1 illustrates an electronic catalog system 30 that includes ananomaly detection engine 32 according to one embodiment of theinvention. The electronic catalog system 30 includes a catalog-basedorder acquisition system 34 that is accessible via a computer network,such as the Internet. The order acquisition system 34 providesfunctionality for users to browse and order items from an electroniccatalog of items using one or more different types of devices, such aspersonal computers 38, Personal Digital Assistants (PDAs) 40, telephones42, and/or interactive televisions 44. The order acquisition system 34may, for example, be in the form of a World Wide Web site that servesweb pages in accordance with the Hypertext Transfer Protocol (HTTP), aninteractive television system, a telephone-based system that supportsbrowsing by voice (e.g., using VoiceXML pages), an online servicesnetwork that uses proprietary client software, or any combinationthereof.

As depicted in FIG. 1, the order acquisition system 34 includes an itemsdatabase 46 that stores information about items that may be ordered (forpurchase, rental, etc.) from the electronic catalog. The items may, forexample, include physical products that are shipped to users, digitalworks that are transferred to users electronically, hotel and car rentalpackages, vacation packages, airline tickets, tickets to events,magazine subscriptions, computer programs, gift carts, stocks and bondstraded on an exchange, and/or other types of items that may be orderedonline. The information stored for each item typically includes theitem's price and availability and a textual description of the item, andmay also include a photo of the item, customer ratings and reviews, andother types of information commonly found in an electronic catalog. In acommercial implementation of the system, many tens of millions ofdifferent items falling within thousands of different item categoriesare represented in the items database 46 and are available for purchasevia the electronic catalog. Although depicted as a single database, theitems database 46 may actually include multiple distinct databases.

Some or all of the information stored in the items database 46 for agiven item is disseminated to users as part of the electronic catalog,such as on item detail pages of a web site. Updates to the catalog aremade by updating the items database 46. The updates may include itemadditions and deletions, and changes to various item attributes (price,availability, description, photo, average customer review, etc.). Theupdates may come from various sources, such as catalog administrators,suppliers, merchants that sell items via the electronic catalog, or aninventory management system.

Errors in the item information supplied by any of the sources of iteminformation may result in an error in the catalog. Examples of the typesof errors that can occur include erroneous price information, erroneousavailability information (e.g., a not-yet-released item is listed asbeing available), and erroneous descriptions of product features (e.g.,a 2-megapixel camera is listed as a 4-megapixel camera). As discussedbelow, the anomaly detection engine 32 rapidly identifies anomalous userbehavior suggestive of these and other types of catalog errors. Theanomaly detection engine 32 may also be used to detect fraudulent useractivity.

The order acquisition system 34 also includes a users database 50 thatstores information about users that have registered with the system 30.The information stored for a given user may include, for example, ausername and password, shipping information, payment information, and ahistory of orders placed by the user.

As illustrated in FIG. 1, orders placed by users via the orderacquisition system 34 are passed over a computer network to an orderprocessing pipeline 52. A given order may include multiple items, andmay include multiple units of a given item. In a commercialimplementation of the system 30, many hundreds to thousands of ordersare typically received and processed per minute, and many tens tohundreds of thousands of different items are typically ordered within agiven one-hour time period.

The order processing pipeline 52 is responsible for collecting paymentsfrom users, such as by charging a user's credit card upon shipment of aset of ordered items. In the case of physical products, the orderprocessing pipeline 52 may also select one or more distribution centersfrom which to ship the ordered items, and may provide associatedmessaging and order tracking for purposes of order fulfillment. In someembodiments, some or all of the orders may be fulfilled by a businessentity other than the entity that operates the electronic catalog system30. For instance, the electronic catalog system 30 may acquire ordersand collect payments for many different merchants.

The primary components of the anomaly detection engine 32, in theillustrated embodiment, are a cache 60 that stores and aggregatesinformation about recently placed orders, a listener 62 that populatesthe cache 60 as orders are placed by users, and an analyzer 64 thatanalyzes aggregated data stored in the cache to detect anomalous userbehavior associated with specific catalog items. The anomaly detectionengine 32 also includes an anomalies database 68 that stores informationabout detected anomalies. In addition, the anomaly detection engine 32includes a reporting component 70 that sends alert messages to catalogadministrators (represented by block 74, which depicts the computers ofthe administrators). The reporting component 70 may also providefunctionality for administrators to interactively generate charts andreports of information stored in the anomalies database 68. The cache 60and the anomalies database 68 may be implemented using any type of datarepository.

In one embodiment, the anomaly detection engine 32 is implementedentirely within software executed by a single, general-purpose computer.Because the anomaly detection engine 32 uses highly efficient dataprocessing algorithms, this single computer is capable of detectinganomalies substantially in real time with a sustained order rate of over10³ orders per minute and a catalog size of over 10⁸ items. Although asingle computer may be used, the anomaly detection engine 32 mayalternatively be implemented using two or more computers.

The operation of the anomaly detection engine 32 will now be describedwith reference to FIG. 1. A more detailed description of the analysissteps performed by the anomaly detection engine 32 will subsequently bedescribed with reference to FIG. 4.

As depicted in FIG. 1, the cache 60 includes two primary types ofdatabase tables: a “recent orders” table 80 and a set of aggregationtables 82. The recent orders table 80 stores detailed information aboutorders recently placed by users. This table 80 is populated by thelistener 62, which passively monitors network traffic to detecttransmissions by the order acquisition system 34 of messages describingnew orders. Information about recent orders may alternatively beobtained from another source, such as by periodically querying adatabase used for order fulfillment. In one embodiment, the recentorders table 80 only stores information about orders placed by usersover the preceding hour. The information stored in the table 80 for eachorder may include the item ID, price, and quantity of each ordered item,and an identifier of the user that placed the order.

Each aggregation table 82 stores aggregated information about ordersplaced during a respective, one-hour time period, such that the ordersplaced during a single day are effectively divided among twenty-fourone-hour “buckets.” Aggregation tables that represent smaller or largertime periods may alternatively be used. For example, time periodsfalling in the range of one minute to six hours, and more typically inthe range of twenty minutes to three hours, may be used. Althoughmultiple aggregation tables 82 are shown in FIG. 1 for purposes ofillustration, a single aggregation table may be used to store all of theaggregated data. For example, in one embodiment, a single aggregationtable 82 is used to store a rolling month's worth of data, which isaggregated using one-hour time periods.

Each aggregation table 82 includes one entry (row) for each item orderedduring the corresponding constituent time period. As illustrated, eachsuch entry contains the ID of the item, the total quantity of that itemordered over the corresponding one-hour time period, and the number ofdistinct users that ordered the item during that time period. In oneembodiment, aggregation tables 82 are maintained in the cache 60 foruser activity occurring over the preceding thirty days. As depicted inFIG. 1, a cache manager 86 periodically generates a new aggregationtable 82 from data stored in the recent orders table 80. The cachemanager 86 may also be responsible for purging aged data from the cache60.

In some embodiments of the invention, the analyzer 64 takes item pricesinto consideration for purposes of detecting anomalies. In theseembodiments, the cache manager 86 may also use the data read from therecent orders table to maintain an item price histories table 88. Theitem price histories table 88 may, for example, store a history of up tothe last X (e.g., 3) price changes detected for each item in thecatalog. Information about recent item prices, if used, mayalternatively be obtained from another source.

The analyzer 64 may be invoked each time a new aggregation table 82 isgenerated in order to search for anomalies in order activity datarecorded therein. As illustrated in FIG. 1, the analyzer 64 includesthree functional blocks or modules, each of which may be implemented insoftware: a problem space reduction module 92, a forecasting module 94,and an anomaly detection or “filtering” module 96. Each of these modules92-96 corresponds to a respective phase of the analysis process.

The problem space reduction module 92 is responsible for selecting, fromthe set of items ordered during the current time period, a relativelysmall subset of items for which to conduct a forecasting and anomalydetection analysis. The purpose of the problem space reduction phase isto reduce the processing burden associated with the forecasting andanomaly detection phases to an acceptable level, such as a level whichpermits the analysis of a one-hour bucket to be completed in less thanone hour. In one embodiment, the problem space reduction module 92selects a total of N items from one or both of the following groups,where N is a selected integer such as 200 or 500:

-   -   1. The items ordered the most frequently during the current time        period, or during some other time period such as the last three        hours; and    -   2. The items for which (total quantity ordered during current        time period)×(recent item price) is the highest.

Group 1 is based primarily on the assumption that the items for whichthe most serious catalog errors exist, such as severe pricing errorsthat are favorable to customers, will likely experience the highestlevels of order activity. Group 2, on the other hand, focuses onrelatively high cost, low volume items, since catalog errors associatedwith these items can be very costly even at relatively low volumes.Because the current price in the catalog may be erroneous, a recent itemprice is used in the calculation for group 2. The recent item price maybe obtained from the item price histories table 88 or some other sourceof price information.

In embodiments in which order volumes are sufficiently low, and/orcomputing resources are sufficiently high, the anomaly analysis may beperformed in connection with all items ordered during the current timeperiod. In such embodiments, the problem space reduction module 92 maybe omitted or disabled.

As depicted in FIG. 1, the IDs of the items selected by the problemspace reduction module 92 are passed to the forecasting module 94. Foreach selected item, the forecasting module 94 uses the data stored inthe aggregation tables 82 from prior time periods to forecast or predictthe total order quantity for the current time period. (As describedbelow, the forecasting module 94 may alternatively predict the number ofdistinct users to order the item during the current time period.) Theforecasted or predicted item quantities for the current time period maybe generated either before or after the current time period has ended.Thus, the terms “forecast” and “predict,” as used herein, are notintended to imply that the forecasted quantities are necessarilygenerated before the corresponding actual quantities are known. Whethergenerated before or after the fact, a “forecast” or “prediction” of whatshould ordinarily happen (or have happened) can be compared to whatactually does (or did) happen.

In one embodiment, the forecasting module 94 uses an exponentialsmoothing algorithm, such as a single, double or triple exponentialsmoothing algorithm, to generate the forecasted item quantities.Exponential smoothing algorithms give exponentially decreasing weight todata values from progressively earlier time periods. Thus, for example,to predict an item's order quantity for the current time period, or “t,”the greatest weight would be given to the item's quantity value from theimmediately preceding time period, t−1, and exponentially decreasingweight would be given to the quantity values from time periods t−2, t−3,and so on. Although an exponential smoothing algorithm is used in theillustrated embodiment, other types of time series forecastingalgorithms may be used, such as single and double moving average,Holt-Winters, and multiple linear regression algorithms.

FIG. 2 illustrates an example set of quantity data values collected overa one week period of time for a particular item. Each data valuerepresents the item's total order quantity for a corresponding one-hourperiod of time. In this example, a sharp increase in the hourly orderquantity occurred just before the date 12/16, indicating a possiblecatalog error. When aberrations of this type occur, the actual quantitywill typically deviate significantly from the forecasted quantity. Inthis particular example, the item at issue was a gift card, and theanomalous user activity was the result of a catalog error that allowedusers to purchase the relevant item at a significant discount. In somecases, anomalies of the type shown in FIG. 2 are the result of otherproblems, such as fraudulent user activity; for example, an unauthorizeddistributor of an item may be attempting to purchase a large number ofunits to re-sell.

Referring again to FIG. 1, the forecasted quantities for the N selecteditems are passed to the anomaly detection module 96, which determineswhich of these items, if any, experienced anomalous order activity. Theanomaly detection module 96 evaluates whether anomalies exist, at leastin part, by comparing the forecasted quantity values to the observed oractual quantity values. (In embodiments in which the distinct number ofusers to purchase each item is forecasted, the anomaly detection modulemay alternatively compare the forecasted numbers of users to the actualnumbers of users.) For example, an anomaly may be deemed to exist for agiven item if its actual quantity for the current time period exceedsthe forecasted quantity by more than a selected threshold, such as 20%.(Aberrations in which the actual quantity is less than the predictedquantity may be ignored.) One or more additional types of data may alsobe taken into consideration in determining whether to treat the currentactivity as an anomaly, such as (a) the number of distinct users thatordered the item during the current period, (b) the price of the item,and/or (c) the quantity ordered during the current period (see examplesbelow).

In one embodiment, the anomaly detection module 96 uses a set of one ormore thresholds to determine, for each selected item, whether an anomalyexists. By way of example and not limitation, an anomaly may be deemedto exist if and only if the following three conditions are met:

1. actual quantity/forecasted quantity>1.2;

2. actual quantity>5; and

3. actual quantity×recent price>$1000

The second of these three conditions filters out those items for whichthe low volume of orders is likely to produce statistically inaccurateforecasting results. The third condition filters out those items forwhich the potential monetary loss over the current time period fallsbelow a selected threshold. The actual threshold values used for theseand other conditions may vary by type or category of product. Inaddition, different thresholds may be used based on the time of day(e.g., greater variations may be permitted during peak periods).

In another embodiment, a scoring algorithm is used to generate arespective score for each of the N selected catalog items. By way ofexample and not limitation, a score may be generated for each itemaccording to the following equation:score=10×(actual quantity/forecasted quantity)+10×(no. distinct userswho order the item)+100×(avg. order size).  Equation 1The score may be compared to one or more thresholds to evaluate whether,or the extent to which, the associated user activity is anomalous. Forexample, scores in the range of 0 to 500 may be treated as normal,scores in the range of over 500 to 1000 may be treated as revealing amedium risk anomaly, and scores above 1000 may be treated as revealing ahigh risk anomaly.

As discussed below, the anomaly detection module 96 may also use arelevance feedback algorithm to adapt to the feedback provided by humanoperators.

As further illustrated in FIG. 1, the anomaly detection module 96records information about any anomalies it detects in the anomaliesdatabase 68, which may be any type of data repository. The informationstored in this database 68 for a given anomaly may specify, for example,the ID of the associated item, the one-hour time period in which theanomaly occurred, the actual and forecasted quantity values for thattime period, and if a scoring algorithm was used, a score or severitylevel associated with the anomaly. The actual quantity values from a setof prior one-hour time periods may also be stored to permit subsequentgeneration and display of a graph of the type shown in FIG. 2.

As depicted in FIG. 1, the reporting module 70 generates alert messagesto notify catalog administrators 74 of some or all of the detectedanomalies. The alert messages may be sent by email, pager, instantmessaging, and/or other communications methods. One example of an emailalert message is illustrated in FIG. 3, which is discussed below.Typically, different catalog administrators are responsible fordifferent categories or lines of products. Accordingly, when an anomalyis detected, the reporting module 70 may use a directory (not shown) tolook up and notify the specific administrator(s) associated with thecorresponding item. The identities of the administrators that receive agiven alert may also be dependent upon the severity of the anomaly.

Upon receiving an alert message, the catalog administrator can determinewhether an error exists in the item's catalog description, such as byviewing the item's detail page. If an error is found, the administratorcan take an appropriate corrective action, such as correcting the errorin the catalog, and possibly blocking pending orders for the relevantitem from being fulfilled. (Assuming one-hour time intervals are used,the anomaly is typically reported within one hour of its occurrence,allowing pending orders placed at the time of the anomaly to beblocked.) In some embodiments, the task of checking for and correctingthe associated catalog error may be partially or fully automated.

FIG. 3 illustrates one example of an email alert message that may beautomatically generated and sent by the reporting component 70. The textof the alert message identifies two items for which anomalies weredetected within the current one-hour time period. For each such item,the alert message indicates, for the current time period, the actual andforecasted (expected) quantities ordered and the number of distinctusers that ordered the item. In addition, the alert message includes ahyperlink to the corresponding item detail page in the catalog, and ahyperlink for viewing a graph of the type shown in FIG. 2 (which may begenerated and displayed by the reporting component 70).

In the example shown in FIG. 3, the alert message also includes buttonsfor the message recipient to provide feedback on whether each anomalywas properly detected and flagged for human review. As depicted by thedashed “feedback” line in FIG. 1, the feedback responses may be recordedin the anomalies database 68 or some other data repository, and may beused by the anomaly detection module 96 to adaptively adjust thesensitivity of the anomaly detection algorithm on an item-by-item basis.FIG. 5, which is discussed below, illustrates one example of how pastadministrator feedback may be taken into consideration in determiningwhether an anomaly should be reported. If a catalog administrator failsto respond to an alert message within a selected time period, thereporting module 70 may send the alert message to one or more additionaladministrators.

Although FIG. 3 illustrates the use of two feedback options (“yes” and“no”), a greater number of options may be provided. For example, messagerecipients may be prompted to rate the severity of the reported anomalyon a specified scale, such as a scale of 1 to 5 to 10.

FIG. 4 illustrates an example sequence of steps that may be performed bythe anomaly detection engine 32 to process and analyze the datacollected during the current time period (e.g., the preceding one-hourperiod). This sequence of steps may be embodied within a computerprogram that is executed periodically, such as once per hour. Thefunctions performed by this sequence of steps represent some or all ofthe functionality of the following components shown in FIG. 1: the cachemanager 86, the problem space reduction module 92, the forecastingmodule 94, the anomaly detection module 96, and the reporting module 70.As will be apparent, the ordering of steps shown in FIG. 4 may bevaried.

In step 100 of FIG. 4, the order data collected in the recent orderstable 80 over the current time period is aggregated and summarized tocreate a corresponding aggregation table 82. During this process, manyentries (orders) in the recent orders table 80 may be condensed into asingle table entry of the aggregation table 82. For example, if thirtydistinct users placed orders for a total of forty units of item 1234, asingle table entry would be created with the values item ID=1234,quantity=40, and distinct users=30.

In step 102, which corresponds to the problem space reduction block inFIG. 1, N of the items ordered during the current time period areselected for further analysis. Typically, N represents a smallpercentage, such as 0.01% to 2%, of the items ordered during the currenttime period. In step 104, one of the N items is selected as the currentitem for analysis.

In step 106, an exponential smoothing algorithm is applied to thecurrent item's aggregation table data (quantity values) from prior timeperiods to calculate the forecasted quantity for the current timeperiod. This step may optionally be performed before the end of thecurrent time period because it relies solely on data from prior timeperiods. For example, before the end of the current time period,forecasted quantities may be calculated for those items that, based onthe activity that has already occurred during the current time period,are predicted to be included in the set of N items. Forecasts for anyadditional items that end up being selected in step 102 can then begenerated at the end of the current time period.

If a double exponential smoothing algorithm is used in step 106, theforecast may be made using the following equations, where F_(t+1) is theforecast for time period t+1, y_(t) represents the actual observationfor time period t, and α and γ are smoothing constants between 0 and 1.F _(t+1) =S _(t) +b _(t)  Equation 2S _(t) =αy _(t)+(1−α)(S _(t−1) +b _(t−1))  Equation 3b _(t)=γ(S _(t) −S _(t−1))+(1−γ)b _(t−1)  Equation 4

In one embodiment, a value of 0.8 is used for each of α and γ. Inanother embodiment, the forecasting module 94 iteratively selects, foreach item, an α and γ that produces a “best match” between the secondexponential smoothing curve and the associated time series of observedquantity values; the α and γ values that produce the best match (lowesterror) are then used to generate the forecasted quantity for that item.

In step 108, the forecasted and actual quantity values, and optionallyother types of data, are used to evaluate whether an anomaly exists inthe current item's order data. This evaluation may be performed usingone of the methods described above or another appropriate method, andmay optionally take into consideration prior feedback provided bycatalog administrators. If an anomaly is detected in step 108, it isrecorded in the anomalies database 68 as depicted in step 110, and analert message is generated and sent to a catalog administrator.

As will be apparent, steps 106 (forecasting) and 108 (anomaly detection)may, in practice, be combined. For example, the two steps may beembodied within a single formula or function that generates a yes/noresponse based on the item's actual quantity values for the current andprior time periods.

As mentioned above, one possible variation to the illustrated embodimentis to forecast and compare the number of distinct users that order theitem, rather than (or in addition to) forecasting and comparing thetotal item quantity. Specifically, in step 106, the number of distinctusers that acquired the current item in prior time periods can be usedto predict the number of distinct users for the current period. Thisnumber can then be compared, in step 108, to the actual number ofdistinct users that acquired the item during the current time period.With this variation, all of the components depicted in FIG. 1, and allof the steps shown in FIG. 4, may otherwise be substantially the same asdescribed herein. Other measures of the demand or order volume for theparticular item may also be used, such as total dollar amount spent onthe item during the relevant time period, or the total number of ordersreceived that include one or more units of the item. Thus, steps 106 and108 may more generally be performed so as to predict the “demand” forthe current item and time period, and to compare this prediction to theactual or observed demand, where “demand” may be predicted and measuredin terms of the total quantity (number of units) of the item ordered,the total number of distinct users who order the item, the total numberof orders received that include one or more units of the item, the totaldollar amount spent by users on the item, and/or other criteria. Othertypes of events reflective of item demand levels, such as the additionof an item to an online shopping cart or wish list, may also be takeninto consideration.

As depicted by the loop that includes step 114, steps 106-108 arerepeated for each additional item in the set of N items until the lastitem is reached in step 112. The order data stored in the recent itemstable 80 for the current time period may then be purged, as shown instep 116.

FIG. 5 illustrates one example of how administrator feedback may betaken into consideration in block 108 of FIG. 4. In this example, it isassumed that a score is generated for the current item, and that thisscore is compared to a threshold to determine whether an anomaly exists.In step 108A, the score is generated for the current item, optionallyusing equation 1 above. In step 108B, prior administrator feedback isused to calculate an adjustment for the current item. The adjustment maybe calculated by subtracting the number of false positives reported forthis item from the number of properly flagged anomalies. In step 108C,the adjustment is multiplied by a weighting factor W, and the result isadded to the score to generate an adjusted score. Thus, if the number ofproperly flagged anomalies exceeds the number of false positives, thescore will be increased; and if the number of properly flagged anomaliesis less than the number of false positives, the score will be decreased.Finally, in step 108D, the adjusted score is compared to a predefinedscore threshold (or possibly multiple score thresholds) to determinewhether an anomaly exists.

Numerous variations to the approach shown in FIG. 5 are possible. Forexample, when administrator feedback is provided in connection with areported anomaly, the feedback may also be taken into consideration, toa lesser extent, in subsequently evaluating order anomalies for otheritems in the same item category. Further, rather than adjusting a score,one or more thresholds may be adjusted in response to the feedback.

As will be appreciated by the foregoing, the disclosed architecture caneasily be scaled by adding additional computers. For example, assuming asingle computer is initially used to implement the anomaly detectionengine 32, the number of items for which an anomaly analysis isconducted each time period can be approximately doubled by adding asecond computer. This second computer can be a replicated version of thefirst computer (i.e., can include all of the components and modulesshown in block 62 of FIG. 1), but programmed to select a different setitems N items for which to conduct the analysis. Thus, for example, ifthe first computer selects the N items having the highest quantities inthe current period (see step 102 of FIG. 4), the second computer can beconfigured to select the next N items with the highest quantities.Numerous other approaches for dividing the anomaly engine'sfunctionality between computers are also possible. In addition, a singleanomaly engine 32 may be configured to monitor orders from multiple,distinct web sites and electronic catalogs.

The invention may also be applied where some or all of the orders areplaced without the use of an electronic catalog. For example, theinvention is applicable to systems that accept orders from recipients ofa paper catalog that describes items that can be ordered. To select anitem to order in such a system, the user may, for example, scan-in acorresponding bar code label from the paper catalog using a PDA or adigital pen, or may specify a product identifier using a computerkeyboard, a telephone keypad, or automated voice recognition. Thecomponents and algorithms used in such paper-catalog-based embodimentsmay be substantially the same as those shown in the drawings anddescribed above. The invention may also be used in systems that acceptorders placed from electronic catalogs that are distributed by CD, DVD,disk, tape, or other types of information storage medium.

Although this invention has been described in terms of certain specificembodiments and applications, other embodiments and applications thatare apparent to those of ordinary skill in the art, includingembodiments that do not provide all of the features and advantages setforth herein, are also within the scope of this invention. Accordingly,the scope of the present invention is defined only by the appendedclaims.

1. A computer-implemented method of detecting anomalous user activityassociated with items in an electronic catalog, the method comprising:storing order data descriptive of orders placed by users for items froman electronic catalog of items; identifying, from the order data, a setof items ordered by users from the electronic catalog during a currenttime period; via execution of instructions by a computing device,selecting, from the set of items, a subset of items for which to conductan anomaly analysis, so as to control a computational processing loadassociated with the anomaly analysis; for each item in the subset, (a)calculating a forecasted demand for the respective item in the currenttime period based on observed demand for the respective item in priortime periods, as reflected by said order data, and (b) evaluatingwhether order activity for the respective item is anomalous based on atleast the forecasted demand for the respective item and an observeddemand for the respective item in the current time period; and inresponse to detection of anomalous order activity in (b), generating analert message that identifies an item associated with the anomalousorder activity.
 2. The method of claim 1, wherein calculating aforecasted demand comprises calculating a forecasted quantity of therespective item ordered during the current time period.
 3. The method ofclaim 1, wherein calculating a forecasted demand comprises calculating aforecasted number of distinct users that order the item during thecurrent time period.
 4. The method of claim 1, wherein storing orderdata descriptive of orders comprises storing aggregated order data foreach of a plurality items and time periods, and the method comprisesusing the aggregated order data to calculate the forecasted demand foreach item in the subset.
 5. The method of claim 1, wherein theforecasted demand for each item is calculated using an exponentialsmoothing algorithm.
 6. The method of claim 5, wherein the exponentialsmoothing algorithm is a double exponential smoothing algorithm.
 7. Themethod of claim 1, wherein the forecasted demand for each item iscalculated using at least one of (a) a moving average algorithm, (b) aHolt-Winters algorithm, and (c) a multiple linear regression algorithm.8. The method of claim 1, wherein the subset of items is selected based,at least in part, on quantities of items ordered during the current timeperiod.
 9. The method of claim 8, wherein the subset of items isselected based additionally on item price data.
 10. The method of claim1, wherein the forecasted demand for at least some of the items in thesubset is calculated in (a) during the current time period.
 11. Themethod of claim 1, wherein the forecasted demand for at least some ofthe items in the subset is calculated in (a) after the current timeperiod.
 12. The method of claim 1, wherein the method is performed by asingle, general purpose computer that monitors order activity associatedwith the electronic catalog.
 13. The method of claim 1, wherein thecurrent time period has a duration falling in the range of one minute tosix hours.
 14. The method of claim 1, wherein the alert message providesan option for a recipient thereof to provide feedback reflective ofwhether the anomalous order activity was properly detected.
 15. Themethod of claim 1, wherein evaluating whether the order activity for therespective item is anomalous comprises taking into consideration priorhuman feedback provided in response to at least one prior alert messagegenerated in association with the respective item.
 16. The method ofclaim 1, wherein evaluating whether the order activity for therespective item is anomalous additionally comprises taking intoconsideration a number of distinct users that ordered the respectiveitem during the current time period.
 17. The method of claim 1, whereinthe method comprises using a single formula that combines both (a) and(b), and in response to detection of anomalous order activity, storinginformation about the anomalous order activity in a data repository. 18.The method of claim 1, wherein the subset of items comprises physicalproducts that are shipped to users.
 19. The method of claim 1, whereinselecting the subset of items comprises using order data to select itemsordered the most frequently during a selected time period.
 20. Themethod of claim 1, wherein selecting the subset of items comprisestaking into consideration, for each item in said set of items, a totalquantity of the item ordered during the current time period and a priceof the item.
 21. The method of claim 1, wherein the subset of items isselected so that the anomaly analysis for the items in the subset isperformed in less than the duration of the current time period, saidduration being no more than three hours.
 22. The method of claim 1,wherein the subset of items is selected so that the anomaly analysis forthe items in the subset is performed prior to fulfillment of ordersplaced for the items during the current time period.
 23. The method ofclaim 1, wherein the anomaly analysis is performed substantially in realtime.
 24. The method of claim 1, wherein the method comprises detectingthe anomalous order activity substantially in real time with a sustainedorder rate of over 10³ orders per minute and a catalog size of over 10⁸items.
 25. The method of claim 1, wherein evaluating whether the orderactivity for the respective item is anomalous comprises comparing theforecasted demand for the respective item to the observed demand for therespective item.
 26. The method of claim 1, wherein each of said timeperiods has a duration of no more than three hours.
 27. The method ofclaim 1, wherein the method is performed periodically by a computersystem that comprises one or more computing devices.
 28. The method ofclaim 1, further comprising, in response to the alert message, assessingwhether the anomalous order activity is a result of an erroneous itemdescription in the electronic catalog.
 29. The method of claim 1,wherein the method in its entirety is automatically performed by amachine that comprises one or more computing devices.
 30. Acomputer-implemented method of detecting anomalous user activityassociated with items in an electronic catalog, the method comprising:storing order data descriptive of orders placed by users for items froman electronic catalog of items; identifying, from the order data, a setof items ordered by users from the electronic catalog during a currenttime period; via execution of instructions by a computing system,selecting, from the set of items, a subset of items for which to conductan anomaly analysis, so as to control a computational processing loadassociated with the anomaly analysis; for each item in the subset, (a)calculating a forecasted demand for the respective item in the currenttime period based on observed demand for the respective item in priortime periods, as reflected by said order data, and (b) evaluatingwhether order activity for the respective item is anomalous based on atleast the forecasted demand for the respective item and an observeddemand for the respective item in the current time period; and inresponse to detection of anomalous order activity in (b), generating analert message that identifies an item associated with the anomalousorder activity, wherein the alert message comprises a hyperlink to acatalog page that describes the item for which the anomalous orderactivity was detected, such that a recipient of the alert message canaccess the catalog page to evaluate whether the anomalous order activityresulted from an error in the catalog.
 31. The method of claim 30,wherein the method in its entirety is automatically performed by amachine that comprises one or more computing devices.
 32. Acomputer-readable medium having stored thereon a set of program modulesthat, when executed by a computer, cause the computer to perform amethod of detecting anomalous user activity associated with items in anelectronic catalog, the method comprising: storing order datadescriptive of orders placed by users for items from an electroniccatalog of items; identifying, from the order data, a set of itemsordered by users from the electronic catalog during a current timeperiod; selecting, from the set of items, a subset of items for which toconduct an anomaly analysis, so as to control a computational processingload associated with the anomaly analysis; for each item in the subset,(a) calculating a forecasted demand for the respective item in thecurrent time period based on observed demand for the respective item inprior time periods, as reflected by said order data, and (b) evaluatingwhether order activity for the respective item is anomalous based on atleast the forecasted demand for the respective item and an observeddemand for the respective item in the current time period; and inresponse to detection of anomalous order activity in (b), generating analert message that identifies an item associated with the anomalousorder activity.
 33. A system for detecting anomalous user activityassociated with items in a catalog, comprising: a data repository thatstores aggregated data descriptive of orders placed by users from acatalog of items, said aggregated data arranged by time period; aforecasting module that analyzes demand levels in prior time periods onan item-by-item basis for at least some items identified as ordered byusers during a current time period, as indicated by the aggregated data,to predict demand levels for respective items in the current timeperiod; an anomaly detection module that detects anomalies associatedwith specific items in the catalog at least by comparing the demandlevels predicted by the forecasting module to corresponding observeddemand levels; a reporting module that generates alert messages tonotify catalog administrators of items for which anomalies are detectedby the anomaly detection module; and computer hardware that executes theforecasting module, the anomaly detection module, and the reportingmodule, the computer hardware comprising one or more computers.
 34. Thesystem of claim 33, wherein the forecasting module predicts an item'sdemand level in terms of at least one of the following: (a) totalquantity of the item ordered in a time period, (b) total number ofdistinct users who order the item in a time period, (c) total number oforders received in a time period that include one or more units of theitem, (d) total dollar amount spent by users on the item in a timeperiod.
 35. The system of claim 33, wherein the forecasting modulepredicts an item's demand level in the current time period by predictinga total quantity of the item ordered during the current time period. 36.The system of claim 33, further comprising a listener that passivelymonitors network traffic to detect new orders, and which storesinformation about the new orders in the data repository.
 37. The systemof claim 33, further comprising a problem space reduction module thatselects, from the items identified as ordered during the current timeperiod, a subset of items for which to conduct an anomaly analysis, soas to reduce a processing load associated with execution of theforecasting and anomaly detection modules.
 38. The system of claim 33,wherein the anomaly detection module implements a relevance feedbackalgorithm to adapt to human feedback provided in association withdetected anomalies.
 39. The system of claim 33, wherein the forecastingmodule calculates the forecasted demand levels using an exponentialsmoothing algorithm.
 40. The system of claim 39, wherein the exponentialsmoothing algorithm is a double exponential smoothing algorithm.
 41. Thesystem of claim 33, wherein the forecasting module calculates theforecasted demand levels using at least one of the following: (a) anexponential smoothing algorithm, (b) a moving average algorithm, (c) aHolt-Winters algorithm, (d) a multiple linear regression algorithm. 42.The system of claim 33, wherein the forecasting module, the anomalydetection module, and the reporting module run on a single, generalpurpose computer.
 43. The system of claim 33, wherein each time periodhas a duration falling in the range of one minute to six hours.
 44. Thesystem of claim 33, wherein the reporting module generates alertmessages that provide an option for recipients thereof to providefeedback reflective of whether the anomalies described in such messageswere properly detected.
 45. The system of claim 33, wherein the systemdetects said anomalies substantially in real time.
 46. The system ofclaim 33, wherein the system detects said anomalies substantially inreal time with a sustained order rate of over 10³ orders per minute anda catalog size of over 10⁸ items.
 47. The system of claim 33, whereineach of said time periods has a duration of no more than three hours.48. The system of claim 37, wherein the problem space reduction moduleuses at least order quantity information and item price information toselect the subset of items.
 49. The system of claim 37, wherein theproblem space reduction module selects the subset of items such that theanomaly detection module detects said anomalies prior to fulfillment ofcorresponding orders.
 50. The system of claim 37, wherein the problemspace reduction module selects the subset of items such that the anomalydetection module detects the anomalies substantially in real time.
 51. Asystem for detecting anomalous user activity associated with items in acatalog, comprising: a data repository that stores aggregated datadescriptive of orders placed by users from a catalog of items, saidaggregated data arranged by time period; a forecasting module thatanalyzes item demand levels in prior time periods on an item-by-itembasis, as indicated by the aggregated data, to predict demand levels forrespective items in a current time period; an anomaly detection modulethat detects anomalies associated with specific items in the catalog atleast by comparing the demand levels predicted by the forecasting moduleto corresponding observed demand levels; a reporting module thatgenerates alert messages to notify catalog administrators of items forwhich anomalies are detected by the anomaly detection module; andcomputer hardware that executes the forecasting module, the anomalydetection module, and the reporting module, the computer hardwarecomprising one or more computers, wherein the reporting module generatesalert messages that include hyperlinks to electronic catalog pages ofassociated items, to facilitate determinations of whether the detectedanomalies are attributable to errors in the catalog.
 52. Acomputer-implemented method of detecting anomalous user activityassociated with use of an electronic catalog, the method comprising:selecting an item ordered from an electronic catalog of items by aplurality of users; determining, via execution of instructions by amachine that comprises one or more computing devices, whether an anomalyexists in user activity data associated with the item at least bycomparing an actual demand for the item in a current time period to anexpected demand that is based on observed demand levels for the item inprior time periods; and in response to determining that an anomalyexists, generating an alert message that identifies the item andprovides a link for accessing a catalog description of the item, tothereby assist a human operator in determining whether the anomaly is aresult of an erroneous description of the item in the electroniccatalog; wherein the alert message is generated prior to fulfillment oforders placed for the item during the current time period.
 53. Themethod of claim 52, wherein the current time period has a durationfalling in the range of one minute to six hours.
 54. The method of claim53, wherein the alert message is generated within one hour of an end ofthe current time period.
 55. The method of claim 53, wherein determiningwhether an anomaly exists comprises using a forecasting algorithm tocalculate said expected demand.
 56. The method of claim 53, whereindetermining whether an anomaly exists comprises taking intoconsideration feedback provided by human operators in response to prioranomaly alert messages.
 57. The method of claim 52, wherein the step ofdetermining whether an anomaly exists is performed substantially in realtime.
 58. The method of claim 52, wherein each of said time periods hasa duration of no more than three hours.
 59. The method of claim 52,wherein the method in its entirety is performed automatically by acomputer system that comprises one or more computing devices.
 60. Themethod of claim 52, further comprising, in response to the alertmessage, assessing whether the anomaly is a result of an erroneousdescription of the item in the electronic catalog.